You just learned a little about what a Gaussian distribution looks like. As a reminder, a Gaussian curve is sometimes called a bell curve because the shape looks like a bell.
To review, the equation for the Gaussian curve is the following:
$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$
where $\mu$ is the mean and $\sigma$ is the standard deviation.
The standard normal distribution, where $\mu=0$ and $\sigma=1$, is selected for by calling np.random.randn()
.
You're probably wondering why Gaussian, a.k.a. normal, distributions are so important. The reason is that the distributions of many things follow a normal distribution -- such as the heights of people, manufactured parts, blood pressure readings, and error measurements -- making it important to understand.
There are specific metrics that describe a normal distribution.
1) The mean, median, and mode of a Gaussian distribution are all the same. 2) There is symmetry about the mean, as in 50% of the values fall to the right of the mean and the other 50% fall to the left. 3) A certain amount of data falls within integer multiples of the standard deviation, as shown below.
Does the lifetimes data we plotted earlier hold up to these three criteria? Let's find out.
Remember, the lifetimes data was imported as the variable lifetimes
before.
In [2]:
lifemean = np.mean(lifetimes) #get mean
lifestd = np.std(lifetimes) #get standard deviation
In [4]:
#import stats module
from scipy import stats
Now calculate the median and mode of the variable lifetimes
and display them.
In [5]:
#your code here
lifemode = stats.mode(lifetimes) #calculate mode
lifemedian = np.median(lifetimes) #calculate median
print(lifemean)
print(lifemode)
print(lifemedian)
Does the lifetimes
data fulfill the first criterion of a Gaussian distribution?
In [6]:
#your code here
numsamp = len(lifetimes)
print(numsamp)
Now that you have the number of samples, you will need to use the median value to find out how many samples lie above and below it.
In [16]:
#Put your code here
#why doesn't this work?
#uppermask = lifetimes>lifemedian
#upperhalf = lifetimes(uppermask) #this should work, but doesn't?
#lowermask = lifetimes<=lifemedian
#lowerhalf = lifetimes(lowermask) #ditto
#but this does?
upperhalf = [ii for ii in lifetimes if ii>lifemedian] #get upper 50%
lowerhalf = [jj for jj in lifetimes if jj<=lifemedian] #get lower 50%
upperperc = len(upperhalf)/numsamp
lowerperc = len(lowerhalf)/numsamp
print(upperperc)
print(lowerperc)
Does the lifetimes
data fulfill the second criterion of a Gaussian distribution?
In [27]:
#Put your code here
plus_std = (lifemedian+1*lifestd, lifemedian+2*lifestd, lifemedian+3*lifestd)
minus_std = (lifemedian-1*lifestd, lifemedian-2*lifestd, lifemedian-3*lifestd)
aboveperc = [None]*3
belowperc = [None]*3
ii=0
while ii<len(plus_std):
data_above = [jj for jj in lifetimes if jj>lifemedian and jj<plus_std[ii]]
aboveperc[ii] = len(data_above)/numsamp
data_below = [kk for kk in lifetimes if kk<=lifemedian and kk>minus_std[ii]]
belowperc[ii] = len(data_below)/numsamp
ii+=1
print('% of data within', ii, 'standard deviations of the median:', aboveperc[ii-1]+belowperc[ii-1])
Does the lifetimes
data fulfill the third criterion of a Gaussian distribution?